How to Choose the Number of Clusters: The Cramer Multiplicity Solution

نویسنده

Adriana Climescu-Haulica

چکیده

Most of the data analysis of microarray gene expression data use a clustering algorithm, as a preprocessing step, in genomic functional analysis for example [6] or as the main discriminating tool, in the tumor classification study [9]. While from the experimenter point of view the simplest clustering method could be the best, it is acknowledged [8] that the reliability of allocation the units to a cluster and primary, the number of clusters are questions waiting for a joint theoretical and practical validation. Model based clustering methods [10] as well as machine learning methods [3, 2] lack from an a priori technique to determine the number of clusters. Usually the number of clusters is fixed such that some reliability criteria is maximized, as an a posteriori procedure, depending strongly on the clustering method. The work we did on clustering algorithms revealed an objective method of choosing the number of clusters, inspired by spectral learning algorithms and Cramer multiplicity. The clustering problem is mapped on the framework of spectral graph theory by means of the min-cut problem. That induces the passage from the discrete domain to the continuous one, by the definition of the eigenfunctions associated with the Laplacian operator of the graph. It is the analysis on the continuous domain allowing the screening of the Cramer multiplicity, otherwise set to 1 for the discrete case. An algorithm for the approximation of the number of clusters is implemented, by the use of the Legendre polynomials. The test set considered is the yeast cell data [4].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessment of Clustering Methods for Predicting Permeability in a Heterogeneous Carbonate Reservoir

Permeability, the ability of rocks to flow hydrocarbons, is directly determined from core. Due to high cost associated with coring, many techniques have been suggested to predict permeability from the easy-to-obtain and frequent properties of reservoirs such as log derived porosity. This study was carried out to put clustering methods (dynamic clustering (DC), ascending hierarchical clustering ...

متن کامل

خوشه‌بندی خودکار داده‌ها با بهره‌گیری از الگوریتم رقابت استعماری بهبودیافته

Imperialist Competitive Algorithm (ICA) is considered as a prime meta-heuristic algorithm to find the general optimal solution in optimization problems. This paper presents a use of ICA for automatic clustering of huge unlabeled data sets. By using proper structure for each of the chromosomes and the ICA, at run time, the suggested method (ACICA) finds the optimum number of clusters while optim...

متن کامل

Model the allocation of productive financial resources from the perspective of livelihood poverty indicators using a combination of clustering methods and SAW technique

Poverty is a social, economic, cultural and political reality that has long been one of the greatest human problems. The diversity of problems, needs and problems of the deprived and low-income groups of the society and the multiplicity of poverty indicators on the one hand, and on the other hand the lack of financial resources and credits to solve the poverty indicators, organizations in charg...

متن کامل

Improved Cramer-Rao Inequality for Randomly Censored Data

As an application of the improved Cauchy-Schwartz inequality due to Walker (Statist. Probab. Lett. (2017) 122:86-90), we obtain an improved version of the Cramer-Rao inequality for randomly censored data derived by Abdushukurov and Kim (J. Soviet. Math. (1987) pp. 2171-2185). We derive a lower bound of Bhattacharya type for the mean square error of a parametric function based on randomly censor...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

How to Choose the Number of Clusters: The Cramer Multiplicity Solution

نویسنده

چکیده

منابع مشابه

Assessment of Clustering Methods for Predicting Permeability in a Heterogeneous Carbonate Reservoir

خوشه‌بندی خودکار داده‌ها با بهره‌گیری از الگوریتم رقابت استعماری بهبودیافته

Model the allocation of productive financial resources from the perspective of livelihood poverty indicators using a combination of clustering methods and SAW technique

Improved Cramer-Rao Inequality for Randomly Censored Data

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

عنوان ژورنال:

اشتراک گذاری